1,821 research outputs found
Learning a Static Analyzer from Data
To be practically useful, modern static analyzers must precisely model the
effect of both, statements in the programming language as well as frameworks
used by the program under analysis. While important, manually addressing these
challenges is difficult for at least two reasons: (i) the effects on the
overall analysis can be non-trivial, and (ii) as the size and complexity of
modern libraries increase, so is the number of cases the analysis must handle.
In this paper we present a new, automated approach for creating static
analyzers: instead of manually providing the various inference rules of the
analyzer, the key idea is to learn these rules from a dataset of programs. Our
method consists of two ingredients: (i) a synthesis algorithm capable of
learning a candidate analyzer from a given dataset, and (ii) a counter-example
guided learning procedure which generates new programs beyond those in the
initial dataset, critical for discovering corner cases and ensuring the learned
analysis generalizes to unseen programs.
We implemented and instantiated our approach to the task of learning
JavaScript static analysis rules for a subset of points-to analysis and for
allocation sites analysis. These are challenging yet important problems that
have received significant research attention. We show that our approach is
effective: our system automatically discovered practical and useful inference
rules for many cases that are tricky to manually identify and are missed by
state-of-the-art, manually tuned analyzers
Automatic Induction of Classification Rules from Examples Using N-Prism
www.dis.port.ac.uk/~bramerma One of the key technologies of data mining is the automatic induction of rules from examples, particularly the induction of classification rules. Most work in this field has concentrated on the generation of such rules in the intermediate form of decision trees. An alternative approach is to generate modular classification rules directly from the examples. This paper seeks to establish a revised form of the rule generation algorithm Prism as a credible candidate for use in the automatic induction of classification rules from examples in practical domains where noise may be present and where predicting the classification for previously unseen instances is the primary focus of attention
Impact of Mandatory Diversity Training: Lessons from a Private University
Attendance at diversity training programs is often dictated by management, and participants find themselves caught between their genuine desire to broaden their understanding of the subject and resentment at being forced to do so. The outcomes of these mandatory training programs have not been systematically assessed. This study looks at the cognitive, attitudinal, and behavioral impacts of attending such a program and finds valuable lessons learned and cautious room for optimism
On the predictability of domain-independent temporal planners
Temporal planning is a research discipline that addresses the problem of generating a totally or a partially ordered sequence of actions that transform the environment from some initial state to a desired goal state, while taking into account time constraints and actions' duration. For its ability to describe and address temporal constraints, temporal planning is of critical importance for a wide range of real-world applications. Predicting the performance of temporal planners can lead to significant improvements in the area, as planners can then be combined in order to boost the performance on a given set of problem instances. This paper investigates the predictability of the state-of-the-art temporal planners by introducing a new set of temporal-specific features and exploiting them for generating classification and regression empirical performance models (EPMs) of considered planners. EPMs are also tested with regard to their ability to select the most promising planner for efficiently solving a given temporal planning problem. Our extensive empirical analysis indicates that the introduced set of features allows to generate EPMs that can effectively perform algorithm selection, and the use of EPMs is therefore a promising direction for improving the state of the art of temporal planning, hence fostering the use of planning in real-world applications.</p
Data mining via ILP: The application of progol to a
As far as this author is aware, this is the first paper to describe the application of Progol to enantioseparations. A scheme is proposed for data mining a relational database of published enantioseparations using Progol. The application of the scheme is described and a preliminary assessment of the usefulness of the resulting generalisations is made using their accuracy, size, ease of interpretation and chemical justification
Semantic Context Forests for Learning-Based Knee Cartilage Segmentation in 3D MR Images
The automatic segmentation of human knee cartilage from 3D MR images is a
useful yet challenging task due to the thin sheet structure of the cartilage
with diffuse boundaries and inhomogeneous intensities. In this paper, we
present an iterative multi-class learning method to segment the femoral, tibial
and patellar cartilage simultaneously, which effectively exploits the spatial
contextual constraints between bone and cartilage, and also between different
cartilages. First, based on the fact that the cartilage grows in only certain
area of the corresponding bone surface, we extract the distance features of not
only to the surface of the bone, but more informatively, to the densely
registered anatomical landmarks on the bone surface. Second, we introduce a set
of iterative discriminative classifiers that at each iteration, probability
comparison features are constructed from the class confidence maps derived by
previously learned classifiers. These features automatically embed the semantic
context information between different cartilages of interest. Validated on a
total of 176 volumes from the Osteoarthritis Initiative (OAI) dataset, the
proposed approach demonstrates high robustness and accuracy of segmentation in
comparison with existing state-of-the-art MR cartilage segmentation methods.Comment: MICCAI 2013: Workshop on Medical Computer Visio
ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks
Hash codes are efficient data representations for coping with the ever
growing amounts of data. In this paper, we introduce a random forest semantic
hashing scheme that embeds tiny convolutional neural networks (CNN) into
shallow random forests, with near-optimal information-theoretic code
aggregation among trees. We start with a simple hashing scheme, where random
trees in a forest act as hashing functions by setting `1' for the visited tree
leaf, and `0' for the rest. We show that traditional random forests fail to
generate hashes that preserve the underlying similarity between the trees,
rendering the random forests approach to hashing challenging. To address this,
we propose to first randomly group arriving classes at each tree split node
into two groups, obtaining a significantly simplified two-class classification
problem, which can be handled using a light-weight CNN weak learner. Such
random class grouping scheme enables code uniqueness by enforcing each class to
share its code with different classes in different trees. A non-conventional
low-rank loss is further adopted for the CNN weak learners to encourage code
consistency by minimizing intra-class variations and maximizing inter-class
distance for the two random class groups. Finally, we introduce an
information-theoretic approach for aggregating codes of individual trees into a
single hash code, producing a near-optimal unique hash for each class. The
proposed approach significantly outperforms state-of-the-art hashing methods
for image retrieval tasks on large-scale public datasets, while performing at
the level of other state-of-the-art image classification techniques while
utilizing a more compact and efficient scalable representation. This work
proposes a principled and robust procedure to train and deploy in parallel an
ensemble of light-weight CNNs, instead of simply going deeper.Comment: Accepted to ECCV 201
Determining appropriate approaches for using data in feature selection
Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases
Neutron scattering in a d_{x^2-y^2}-wave superconductor with strong impurity scattering and Coulomb correlations
We calculate the spin susceptibility at and below T_c for a d_{x^2-y^2}-wave
superconductor with resonant impurity scattering and Coulomb correlations. Both
the impurity scattering and the Coulomb correlations act to maintain peaks in
the spin susceptibility, as a function of momentum, at the Brillouin zone edge.
These peaks would otherwise be suppressed by the superconducting gap. The
predicted amount of suppression of the spin susceptibility in the
superconducting state compared to the normal state is in qualitative agreement
with results from recent magnetic neutron scattering experiments on
La_{1.86}Sr_{0.14}CuO_4 for momentum values at the zone edge and along the zone
diagonal. The predicted peak widths in the superconducting state, however, are
narrower than those in the normal state, a narrowing which has not been
observed experimentally.Comment: 24 pages (12 tarred-compressed-uuencoded Postscript figures), REVTeX
3.0 with epsf macros, UCSBTH-94-1
- …